Automatic Determination of K in Distributed K-Means Clustering
نویسندگان
چکیده
منابع مشابه
Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm
Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...
متن کاملDistributed k-Means and k-Median Clustering on General Topologies
This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by [13], we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We p...
متن کاملK-Means Clustering with Distributed Dimensions (Supplement)
1. Proof of Theorem 4 Proof. It is easy to verify the communication cost, and thus we focus on the proof for the approximation ratio below. Similar to the proof of Theorem 1, the grid G is rewritten as {g1, · · · , gm} where m = (k + z) , and for each gj , its corresponding intersection ⋂T l=1Mlil is rewritten as Sj . Meanwhile, we denote the index-set indicating the outliers obtained by our al...
متن کاملDistributed Document Clustering Using K-Means
Document clustering, one of the traditional data mining techniques, is an unsupervised learning paradigm where clustering methods try to identify inherent grouping of the text documents.The importance of document clustering emerges from the massive volumes of textual documents created. Also, with more and more development of information technology, data set in many domains is reaching beyond pe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Procedia Computer Science
سال: 2019
ISSN: 1877-0509
DOI: 10.1016/j.procs.2020.01.050